36 research outputs found

    Big Data and Causality

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Causality analysis continues to remain one of the fundamental research questions and the ultimate objective for a tremendous amount of scientific studies. In line with the rapid progress of science and technology, the age of big data has significantly influenced the causality analysis on various disciplines especially for the last decade due to the fact that the complexity and difficulty on identifying causality among big data has dramatically increased. Data mining, the process of uncovering hidden information from big data is now an important tool for causality analysis, and has been extensively exploited by scholars around the world. The primary aim of this paper is to provide a concise review of the causality analysis in big data. To this end the paper reviews recent significant applications of data mining techniques in causality analysis covering a substantial quantity of research to date, presented in chronological order with an overview table of data mining applications in causality analysis domain as a reference directory

    Comparison of Ensemble-Based Multiple Instance Learning Approaches

    No full text
    Multiple instance learning (MIL) is concerned with learning from training set of bags including multiple feature vectors. This paradigm has various algorithms as a solution for multiple instance problem. Recently, ensemble learning has become one of the most preferred machine learning technique because its high classification ability. The main goal of ensemble learning is combining multiple learning models and obtaining a decision from all outputs of these models. Considering this motivation, the study presented in this paper proposes an ensemble-based multiple instance learning approach which merges standard algorithms (MIWrapper and SimpleMI) with ensemble learning methods (Bagging and AdaBoost) to improve classification ability. The proposed approach includes ensemble of combination of MIWrapper and SimpleMI learners with Naive Bayes, Support Vector Machines (SVM), Neural Networks (Multilayer Perceptron (MLP)), and Decision Tree (C4.5) as base classifiers. In the experimental studies, the proposed ensemble-based approach was compared with individual MIWrapper and SimpleMI algorithms in terms of accuracy. The obtained results indicate that the ensemble-based approach shows higher classification ability than the conventional solutions. © 2019 IEEE

    K-Linkage: A New Agglomerative Approach for Hierarchical Clustering

    No full text
    In agglomerative hierarchical clustering, the traditional approaches of computing cluster distances are single, complete, average and centroid linkages. However, single-link and complete-link approaches cannot always reflect the true underlying relationship between clusters, because they only consider just a single pair between two clusters. This situation may promote the formation of spurious clusters. To overcome the problem, this paper proposes a novel approach, named k-Linkage, which calculates the distance by considering k observations from two clusters separately. This article also introduces two novel concepts: k-min linkage (the average of k closest pairs) and k-max linkage (the average of k farthest pairs). In the experimental studies, the improved hierarchical clustering algorithm based on k-Linkage was executed on five well-known benchmark datasets with varying k values to demonstrate its efficiency. The results show that the proposed k-Linkage method can often produce clusters with better accuracy, compared to the single, complete, average and centroid linkages

    Integrating Cluster Analysis to the ARIMA Model for Forecasting Geosensor Data

    No full text
    Clustering geosensor data is a problem that has recently attracted a large amount of research. In this paper, we focus on clustering geophysical time series data measured by a geo-sensor network. Clusters are built by accounting for both spatial and temporal information of data. We use clusters to produce globally meaningful information from time series obtained by individual sensors. The cluster information is integrated to the ARIMA model, in order to yield accurate forecasting results. Experiments investigate the trade-off between accuracy and efficiency of the proposed algorithm

    Stress Modelling Using Transfer Learning in Presence of Scarce Data

    No full text
    Stress at work is a significant occupational health concern nowadays. Thus, researchers are looking to find comprehensive approaches for improving wellness interventions relevant to stress. Recent studies have been conducted for inferring stress in labour settings; they model stress behaviour based on non-obtrusive data obtained from smartphones. However, if the data for a subject is scarce, a good model cannot be obtained. We propose an approach based on transfer learning for building a model of a subject with scarce data. It is based on the comparison of decision trees to select the closest subject for knowledge transfer. We present an study carried out on 30 employees within two organisations. The results show that the in the case of identifying a “similar” subject, the classification accuracy is improved via transfer learning

    A fuzzy index for detecting spatiotemporal outliers

    No full text
    The detection of spatial outliers helps extract important and valuable information from large spatial datasets. Most of the existing work in outlier detection views the condition of being an outlier as a binary property. However, for many scenarios, it is more meaningful to assign a degree of being an outlier to each object. The temporal dimension should also be taken into consideration. In this paper, we formally introduce a new notion of spatial outliers. We discuss the spatiotemporal outlier detection problem, and we design a methodology to discover these outliers effectively. We introduce a new index called the fuzzy outlier index, FoI, which expresses the degree to which a spatial object belongs to a spatiotemporal neighbourhood. The proposed outlier detection method can be applied to phenomena evolving over time, such as moving objects, pedestrian modelling or credit card fraud

    Chain-detection Between Clusters

    No full text
    corecore